Learning aspect models with partially labeled data
نویسندگان
چکیده
0167-8655/$ see front matter 2010 Elsevier B.V. A doi:10.1016/j.patrec.2010.09.004 ⇑ Corresponding author. Address: National Centre f okritos”, Athens, Greece. Tel.: +302106503204; fax: + E-mail address: akrithara@iit.demokritos.gr (A. Kri In this paper, we address the problem of learning aspect models with partially labeled data for the task of document categorization. The motivation of this work is to take advantage of the amount of available unlabeled data together with the set of labeled examples to learn latent models whose structure and underlying hypotheses take more accurately into account the document generation process, compared to other mixture-based generative models. We present one semi-supervised variant of the Probabilistic Latent Semantic Analysis (PLSA) model (Hofmann, 2001). In our approach, we try to capture the possible data mislabeling errors which occur during the training of our model. This is done by iteratively assigning class labels to unlabeled examples using the current aspect model and re-estimating the probabilities of the mislabeling errors. We perform experiments over the 20Newsgroups, WebKB and Reuters document collections, as well as over a real world dataset coming from a Business Group of Xerox and show the effectiveness of our approach compared to a semi-supervised version of Naive Bayes, another semisupervised version of PLSA and to transductive Support Vector Machines. 2010 Elsevier B.V. All rights reserved.
منابع مشابه
Learning Hybrid Models for Image Annotation with Partially Labeled Data
Extensive labeled data for image annotation systems, which learn to assign class labels to image regions, is difficult to obtain. We explore a hybrid model framework for utilizing partially labeled data that integrates a generative topic model for image appearance with discriminative label prediction. We propose three alternative formulations for imposing a spatial smoothness prior on the image...
متن کاملReachability checking in complex and concurrent software systems using intelligent search methods
Software system verification is an efficient technique for ensuring the correctness of a software product, especially in safety-critical systems in which a small bug may have disastrous consequences. The goal of software verification is to ensure that the product fulfills the requirements. Studies show that the cost of finding and fixing errors in design time is less than finding and fixing the...
متن کاملAn Extension of the Aspect PLSA Model to Active and Semi-Supervised Learning for Text Classification
In this paper, we address the problem of learning aspect models with partially labeled examples. We propose a method which benefits from both semi-supervised and active learning frameworks. In particular, we combine a semi-supervised extension of the PLSA algorithm [11] with two active learning techniques. We perform experiments over four different datasets and show the effectiveness of the com...
متن کاملDiverse reduct subspaces based co-training for partially labeled data
Keywords: Rough set theory Markov blanket Attribute reduction Rough co-training Partially labeled data Rough set theory is an effective supervised learning model for labeled data. However, it is often the case that practical problems involve both labeled and unlabeled data, which is outside the realm of traditional rough set theory. In this paper, the problem of attribute reduction for partiall...
متن کاملDetecting Concept Drift in Data Stream Using Semi-Supervised Classification
Data stream is a sequence of data generated from various information sources at a high speed and high volume. Classifying data streams faces the three challenges of unlimited length, online processing, and concept drift. In related research, to meet the challenge of unlimited stream length, commonly the stream is divided into fixed size windows or gradual forgetting is used. Concept drift refer...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Pattern Recognition Letters
دوره 32 شماره
صفحات -
تاریخ انتشار 2011